This document is a compilation of NCEAS training material on version control to create a 45-60 min crash course on getting started with version control for RStudio users. You will learn:
Version control is a system that helps you to manage the version of your files. It will help you to never have to duplicate files using save as as a way to keep different versions of a file (see below). Version control help you to create a timeline of snapshots containing different versions of a file. Bonus you can add a short description to remember what each specific version is about.
For scientists, version control is a useful tool to help you to track changes you make to your scripts and enable you to share your codes with your collaborators. For example, if you break your code, git can help you to revert to an earlier working version. Another example could be that you want one of your collaborators to add a new feature to your code to improve your analysis? Version control can help you to do so in a smooth and organized manner, tracking who changed what in the script.
This training material focuses on the code versioning system called Git. Note that there are others, such as Mercurial or svn for example.
Git is a free and open source distributed version control system. It has many functionalities and was originally geared towards software development and production environment. In fact, Git was initially designed and developed in 2005 by Linux kernel developers (including Linus Torvalds) to track the development of the Linux kernel. Here is a fun video of Linus Torvalds touting Git to Google.
Git can be enabled on a specific folder/directory on your file system to version files within that directory (including sub-directories). In git (and other version control systems) terms, this “tracked folder” is called a repository (which formally is a specific data structure storing versioning information).
MacOSX and Linux computers all come with git pre-install, but not always directly usable. The best way to test if git is ready to use is at the command line:
git --version
## git version 2.17.2 (Apple Git-113)
It should return something like above. *If you get an error, you will have to install git**
Windows users will have to install a software called git bash before being able to use git.
You can download a copy of git here: https://git-scm.com/downloads and follow the instructions.
You can keep the options to default during the installation, until you reach Configuring the terminal emulator to use with Git Bash -> be sure Use MinTTY is selected. This will install both git and a set of useful command-line tools using a trimmed down Bash shell.
Depending on the version, you might have to run few commands from the terminal. Please refer to the README.txt that comes with the download regarding the exact steps to follow.
git identityBefore you start using git on any computer, you will have to set your identity on your system, as every snapshot of files is associated with the user whom implemented the modifications to the file(s).
Open the Terminal or git bash and then type the following commands.
Your name and email:
git config --global user.name "your Full Name"
git config --global user.email "your Email"
Check that everything is correct:
git config --global --list
Modify everything at the same time:
git config --global --edit
Set your text editor:
git config --system core.editor nano
Here nano is used as example; you can choose most of the text editor you might have installed on your computer (atom, sublime, notepad++ …).
Problem with any of those steps? Check out Jenny Brian [Happy git trouble shooting section] (http://happygitwithr.com/troubleshooting.html){target=“_blank“}
In most of the cases, RStudio should automatically detect git when it is installed on your computer. The best way to check this is to go to the Tools menu -> Global Options and click on git/SVN
If git is properly setup, the window should look like this:
Click OK.
Note: if git was not enabled, you might be asked to restart RStudio to enable it.
Git can be enabled on a specific folder/directory on your file system to version files within that directory (including sub-directories). In git (and other version control systems) terms, this “tracked folder” is called a repository (which formally is a specific data structure storing versioning information).
Although there many ways to start a new repository, GitHub (or any other could solutions, such as GitLab) provide among the most convenient way of starting a repository.
Let’s distinguish between git and GitHub:
GitHub is a company that hosts git repositories online and provides several collaboration features (among which forking). GitHub fosters a great user community and has built a nice web interface to git, also adding great visualization/rendering capacities of your data.
This screen shows the copy of a repository stored on GitHub, with its list of files, when the files and directories were last modified, and some information on who made the most recent changes.
If we drill into the “commits” for the repository, we can see the history of changes made to all of the files. Looks like
kellijohnson and seananderson were fixing things in June and July:
And finally, if we drill into the changes made on June 13, we can see exactly what was changed in each file:
Tracking these changes, how they relate to released versions of software and files is exactly what Git and GitHub are good for. And we will show how they can really be effective for tracking versions of scientific code, figures, and manuscripts to accomplish a reproducible workflow.
We are going to create a new repository on your GitHub account. If you do not have an account yet, it is free to create one here: https://github.com/join?source=header-home
To create a new repository follow these steps:
myfirst-repo- instead of spaces or _).gitignore file (optional). As the name suggest, the gitignore file is used to specify the file format that git should not track. GitHub offers pre-written gitignore files for commodity Here is a website to look for more pre-written
.gitignore files: https://github.com/github/gitignore
=> Here it is, you now have a repository in the cloud!!
The next step is going to get a local copy of this repository to your personal computer. In git jargon, creating an exact copy of a repository on your local computer is called cloning.
RStudio can help us to clone a repository. Since RStudio Projects also work at the folder/directory level, it is the “unit” that is going to be used to link a repository to RStudio.
RStudio Project from the upper-right corner of the RStudio IDE window, choosing New Projectgitclone or download button and copy the URL to your repositoryCreate Project ** => Congratulations!! you have cloned the repository to your computer and created a RStudio project out of it.**
You can also use your computer file browser to look at the files in the repository. You have two files:
my-repo-name.Rproj file for the RStudio Project you just created. Note that because we left the second box empty on step 5, the name of the repository was used to name the RStudio project.README.md file that was automatically generated by GitHub when creating the repositoryIf you look again at your repository page on GitHub you will noticed that the .Rproj file is not there. It is because this file was created by RStudio on your local machine and you have not yet try to synchronize the files between your local copy and the one in the cloud (remote copy in git jargon). Note also that the .gitignore file is not showing up in the Finder view. It is because files with a name starting with a dot are considered “hidden”. By default most of OS will not show those files. However if you use the Files panel in RStudio, you can see the .gitignore file.
We are going to edit the README.md file, adding more information about the repository (purpose of the this file). You can directly edit this file in RStudio. You can open the file by clicking on its name from the Files tab in the lower-right panel.
gitYou modify files in your working directory and save them as usual
You add snapshots of your changed files to your staging area
You do a commit, which takes the files as they are in the staging area and permanently stores them as snapshots to your Git directory.
We can make an analogy with taking a family picture, where each family member would represent a file.
These 2-step process enables you to flexibly group files into a specific commit.
These steps are repeated for every version you want to keep (every time you would like to use save as). Every time you commit, you create a new snapshot, you add the new version of the file to the git database, while keeping all the previous versions in the database. It creates an history of the content of your repository that is like a graph that you can navigate:
GitHub